Electrical scooters have taken cities by storm. The scooters are loved by many, but are also the root of controversy. I myself am an avid user of the scooters, they are easy to use and they have saved me quite some time. Most of the scooter companies have easily accsesible APIs where they share the location data of each their scooters. I have collected data for some weeks on scooters in Oslo, and we will now try to get an overview and see what we can reveal about the use of them

An explanation of how to connect to the different APIs can be found in the following github repository. https://github.com/ubahnverleih/WoBike

The Enturs endpoint was also used, and its documentation can be found on the following site. https://developer.entur.org/pages-mobility-docs-scooters

We first plot a sample of the positions over maps from Kartverket (https://www.kartverket.no/data/) and from openstreetmapdata (https://osmdata.openstreetmap.de/data/land-polygons.html) getting:

Scooter movements downtown

Scooter movements

300x300m grid, more purple means higher density of scooters

Average scooter power

300x300m grid, more green means higher average power level in area

It should be mentioned that the endpoints can be somewhat inconsistent, where for instance (which will be seen later) Voi’s data in the Entur endpoint seem to have stopped logging after Jul 29.

For Jul 27 data was collected every 30th second, while for all other days data was collected every 10th minute.

We first study the data on a 10 minute interval..

# Reading data
df <- read.csv("processed-data/output10min.csv")

# Removing index columns
df <- df[,-c(1,2)]

# Add columns about time stamps
df$time <- as.POSIXct(df$time, "%Y-%m-%d %H:%M:%S", tz="Europe/Paris")
df$day <- strftime(df$time, format = "%d")
df$month <- strftime(df$time, format = "%m")
df$year <- strftime(df$time, format = "%Y")
df$hour <- strftime(df$time, format = "%H")

Printing a summary of the data we get the following:

summary(df)
##     Distance        ScooterModel           id               id2         
##  Min.   :   1.8           :2632461   407    :   1976          : 417790  
##  1st Qu.: 455.9    Circ B1: 417790   171    :   1975   ZVS1353:   1976  
##  Median : 817.4                      207    :   1975   ZVS1108:   1975  
##  Mean   : 888.5                      244    :   1975   ZVS1128:   1975  
##  3rd Qu.:1225.2                      262    :   1975   ZVS1145:   1975  
##  Max.   :4574.5                      268    :   1975   ZVS1191:   1975  
##  NA's   :2632461                     (Other):3038400   (Other):2622585  
##              idScooterState        index         isBookable     
##                     :2632461   Min.   : 0.0           :2632461  
##  DEPLOYED_FOR_RENTAL: 417790   1st Qu.:11.0      False: 417790  
##                                Median :24.0                     
##                                Mean   :24.1                     
##                                3rd Qu.:37.0                     
##                                Max.   :49.0                     
##                                NA's   :2632461                  
##       lat             lon          operator           power       
##  Min.   : 0.00   Min.   :-3.762   circ : 417790   Min.   :  0.00  
##  1st Qu.:59.91   1st Qu.:10.731   tier :1092635   1st Qu.: 50.00  
##  Median :59.91   Median :10.751   voi  :1352087   Median : 75.00  
##  Mean   :51.54   Mean   :17.177   zvipp: 187739   Mean   : 69.86  
##  3rd Qu.:59.93   3rd Qu.:10.769                   3rd Qu.: 93.00  
##  Max.   :61.12   Max.   :85.381                   Max.   :100.00  
##                                                                   
##    rangeLeft            time                         txt_code      
##         :2632461   Min.   :2019-07-21 16:44:27           :2632461  
##  25 km  :  59792   1st Qu.:2019-07-23 22:34:27   439803.0:   1002  
##  22 km  :  25227   Median :2019-07-26 04:54:27   527849.0:    958  
##  15 km  :  24246   Mean   :2019-07-26 19:38:27   381636.0:    953  
##  23 km  :  23209   3rd Qu.:2019-07-28 18:24:27   483376.0:    952  
##  19 km  :  23187   Max.   :2019-08-04 10:44:27   917980.0:    951  
##  (Other): 262129                                 (Other) : 412974  
##     last_lat        last_lon      distanceTravelled      day           
##  Min.   : 0.00   Min.   :-3.762   Min.   :       0   Length:3050251    
##  1st Qu.:59.91   1st Qu.:10.731   1st Qu.:       0   Class :character  
##  Median :59.91   Median :10.751   Median :       2   Mode  :character  
##  Mean   :51.54   Mean   :17.177   Mean   :   38040                     
##  3rd Qu.:59.93   3rd Qu.:10.769   3rd Qu.:      10                     
##  Max.   :61.12   Max.   :85.381   Max.   :19987315                     
##  NA's   :5646    NA's   :5646                                          
##     month               year               hour          
##  Length:3050251     Length:3050251     Length:3050251    
##  Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character  
##                                                          
##                                                          
##                                                          
## 

We however see that there are a good deal of outliers, i.e. in the lat, lon variables, and further we see that not all operators log the same, i.e. Distanceand ScooterModel. Are however quite consistetly reported.

last_lat, last_lon were added in the data cleaning part and represent observed the position of the scooter in the interval before the selected row.

We have data on four operators, circ, tier, voi and zvipp. With over one million scooter observations for tierand voi, and a few hunderd thousands scooter observations from circ and zvipp.

So have many scooters are there actually? We plot number of unique scooter ids over time to find out.

# Number of unique scooters per day
df.unique <- df %>% group_by(operator,time) %>% summarise(n = n_distinct(id))
p <- ggplot(df.unique, aes(x=time, y = n, color = operator)) +
  geom_line() +
  ggtitle("Unique scooter ids in Oslo") + 
  ylab("Number of unique scooter ids") + 
  xlab("Time") + 
  scale_colour_manual(values=c("#F56600","#6DDDb2","#F46C62","#009F1F"))
ggplotly(p)

Voi seems to be the largest operator with nearing 2000 unique scooter ids at max while tieris the second largest with nearing 1250 unique ids at max. tier and zvipp seems to close their network around 20:00 and opens around 01:00. Voi and circ seems to let most scooters be active at all times, having least active ones at 03:00

We also see that the Voicollection stopped around Jul 29.

Number of trips

We first look at scooter movment inbetween position being logged. In the data cleaning step we calculated the distance moved inbetween the time intervals using the Haversine function. This yields measurments as the crow flies (a city is often more complex than that) but it makes for easier calculation. The results are logged in the distanceTravlled column. To filter out inactive and inconsistent movement we ignore scooters logged to have moved under 50m and over 15000m. We should be left with actual movements. An overview is found in the next graph:

df.trips <- df[df$distanceTravelled > 50 & df$distanceTravelled < 15000,]
p <- ggplot(df.trips, aes(x = distanceTravelled, fill = operator)) + 
  geom_histogram(colour='black',size=0.1, breaks = seq(0,15000,500)) + 
  xlab("Distance travelled") + 
  ylab("Count") + 
  ggtitle("Scooter movements count on distance") +
  scale_fill_manual(values=c("#F56600","#6DDDb2","#F46C62","#009F1F"))
ggplotly(p)
p <- ggplot(df.trips, aes(x = distanceTravelled, fill = operator)) + 
  geom_histogram(colour='black',size=0.1, breaks = seq(0,15000,500), position="fill") + 
  xlab("Distance travelled") + 
  ylab("Count") + 
  ggtitle("Scooter movements density on distance") +
  scale_fill_manual(values=c("#F56600","#6DDDb2","#F46C62","#009F1F"))
ggplotly(p)

We see that most movements are under 2500m. Most scooters have a top speed of 20km/h so in ten minutes so it can on max speed travel 2km, so faulty measuring or other types of movement is probably the reason.

Interestingly we see that eventhough tierhas close to half the amount of scooters at max, compared to voithey are account for a much larger percentage of the movements under 2500m. This is probably due to their lower price, I exclusively use Tier they seem to be the cheapest and have a realtively big scooter fleet.

A scooter can however be used for more than ten minutes, so to estimate a trip we cluster time intervals where movement inbetween each interval on over 50m and less than 3000 substaniate being part of the same trip.

# Finding start and end of trips
df.trip <- df %>% group_by(operator, id) %>% mutate(isEnd = ifelse((lead(distanceTravelled)<50 | lead(distanceTravelled)>3000) & (distanceTravelled > 50 | distanceTravelled > 3000),1,0),
                                               isStart = ifelse((distanceTravelled < 50 | distanceTravelled > 3000) & (lead(distanceTravelled) > 50 & lead(distanceTravelled) < 3000),1,0)) %>% ungroup()

# Finding intervals part of a trip inbetween strat and end. 
df.trip <- df.trip %>% group_by(operator, id) %>% mutate(isMid = ifelse((lead(isStart) == 1 | (lead(distanceTravelled)> 50 & lead(distanceTravelled) < 3000)) & (lag(distanceTravelled)> 50 & lag(distanceTravelled) < 3000),1,0)) %>% ungroup()
df.trip$monthDay <- strftime(df$time, format = "%m/%d")
df.start <- df.trip[df.trip$isStart == 1,]
df.start <- df.start[!is.na(df.start$time),]
p <- ggplot(df.start, aes(x = monthDay, fill = operator, colour='black',size=1)) + geom_bar(aes(y=..count..), position='dodge',colour='black',size=0.02) + 
  xlab("Day") + 
  ylab("Count") + 
  ggtitle("Estimated scooter trips per day") + 
  theme(axis.text.x = element_text(angle = 90))
  scale_fill_manual(values=c("#F56600","#6DDDb2","#F46C62","#009F1F")) 
## <ggproto object: Class ScaleDiscrete, Scale, gg>
##     aesthetics: fill
##     axis_order: function
##     break_info: function
##     break_positions: function
##     breaks: waiver
##     call: call
##     clone: function
##     dimension: function
##     drop: TRUE
##     expand: waiver
##     get_breaks: function
##     get_breaks_minor: function
##     get_labels: function
##     get_limits: function
##     guide: legend
##     is_discrete: function
##     is_empty: function
##     labels: waiver
##     limits: NULL
##     make_sec_title: function
##     make_title: function
##     map: function
##     map_df: function
##     n.breaks.cache: NULL
##     na.translate: TRUE
##     na.value: NA
##     name: waiver
##     palette: function
##     palette.cache: NULL
##     position: left
##     range: <ggproto object: Class RangeDiscrete, Range, gg>
##         range: NULL
##         reset: function
##         train: function
##         super:  <ggproto object: Class RangeDiscrete, Range, gg>
##     reset: function
##     scale_name: manual
##     train: function
##     train_df: function
##     transform: function
##     transform_df: function
##     super:  <ggproto object: Class ScaleDiscrete, Scale, gg>
ggplotly(p)

Here we again see that tier and voi have close to the same amount of trips eventhough voi have a much bigger fleet.

We note that the data is the most consistent (when considering all operators) for 24/07 to 27/07.

So how many times a day is a scooter on average in use?

df.start.perScooter <- df.trip[!is.na(df.trip$isStart),]
df.start.perScooter <- df.start.perScooter %>% 
  group_by(operator, monthDay) %>%
  summarise(averageTripsPerScooter = sum(isStart) / length(unique(id))) %>%
  ungroup()

p <- ggplot(df.start.perScooter, aes(x = monthDay, y = averageTripsPerScooter, fill=operator), colour='black',size=0.02) + 
  geom_bar(stat="identity", position=position_dodge(), colour='black',size=0.02) +
  xlab("Day") + 
  ylab("Average trips per scooter") + 
  ggtitle("Estimated average trips per scooter") + 
  theme(axis.text.x = element_text(angle = 90)) +
  scale_fill_manual(values=c("#F56600","#6DDDb2","#F46C62","#009F1F"))

ggplotly(p)

A scooter seems to on average be used four times a day and Tier scooters seems to be used the most. WHile Voi and Circ seem to be used the least.

The results here should be taken with a grain of salt, the scooters might be moved around without it being a trip, for example the scooters are often relocated in the start of a day. (I have never seen this happen, however the scooters tend to be neatly organized in the mornings and I doubt it is the work of scooter riders)

We can further look at the time of day when the scooters are used:

df.start.perScooter <- df.trip[!is.na(df.trip$isStart),]
df.start.perScooter <- df.start.perScooter %>% 
  group_by(operator, hour) %>%
  summarise(averageTripsPerScooter = sum(isStart) / length(unique(id)) / length(unique(cbind(month, day)))) %>%
  ungroup()


p <- ggplot(df.start.perScooter, aes(x = hour, y = averageTripsPerScooter, fill=operator), colour='black',size=0.02) + 
  geom_bar(stat="identity", colour='black',size=0.02) +
  xlab("Hour of day") + 
  ylab("Average trips per scooter") + 
  ggtitle("Estimated average trips per scooter (adjusted for days logged)") +
  theme(axis.text.x = element_text(angle = 90)) +
  scale_fill_manual(values=c("#F56600","#6DDDb2","#F46C62","#009F1F"))
ggplotly(p)

Here we see that during the logging period, most scooter activity happend during the afternoon. Some movement, especially in the evenings and mornings, might be due to relocation, charging and other maintanance.

And lastly we can look at their use after the day of the week.

df.start.perScooter <- df.trip[!is.na(df.trip$isStart),]
df.start.perScooter$day <- weekdays(df.start.perScooter$time, abbreviate = T)
df.start.perScooter <- df.start.perScooter %>% 
  group_by(operator, day) %>%
  summarise(averageTripsPerScooter = sum(isStart) / length(unique(id)) / length(unique(cbind(month, day)))) %>%
  ungroup()


p <- ggplot(df.start.perScooter, aes(x = day, y = averageTripsPerScooter, fill=operator), colour='black',size=0.02) + 
  geom_bar(stat="identity", colour='black',size=0.02,position='dodge') +
  xlab("Day of week") + 
  ylab("Average trips per scooter") + 
  ggtitle("Estimated average trips per scooter (adjusted for days logged)") +
  theme(axis.text.x = element_text(angle = 90)) +
  scale_fill_manual(values=c("#F56600","#6DDDb2","#F46C62","#009F1F"))
ggplotly(p)

Scooters seems to be more used during weekdays than in weekends, it should be noted that the data was logged during a time period where many has summer holiday, and as a result that there are few people that use them to commute to work.

We can do a similar analysis on average power levels.

df.power <- df %>% 
  group_by(operator, hour) %>%
  summarise(power = mean(power)) %>%
  ungroup()
df.power$hour <- as.numeric(df.power$hour)
p <- ggplot(df.power, aes(x = hour, y = power, colour=operator)) + 
  geom_line() +
  xlab("Hour of day") + 
  ylab("Average power level") + 
  ggtitle("Average power level") +
  theme(axis.text.x = element_text(angle = 90)) +
  scale_colour_manual(values=c("#F56600","#6DDDb2","#F46C62","#009F1F"))
ggplotly(p)

As one would expect power levels seem to drop after a day of use. Seems that Tier has a tendency of charging most of their scooters during night while the others seems to charge more evenly throughout the the day.

Estimating market size

We can estimate earnings per day for each of the operators.

df.start.perScooter <- df.trip[!is.na(df.trip$isStart),]
df.start.perScooter <- df.start.perScooter[!is.na(df.start.perScooter$time),]
df.start.perScooter$monthDay <- strftime(df.start.perScooter$time, format = "%m/%d")
df.start.perScooter <- df.start.perScooter[df.start.perScooter$isStart == 1,]
df.start.perScooter$incomePerRide <- 25

df.estimatedIncome <- df.start.perScooter  %>%
  group_by(operator, year, month, day) %>%
  summarise(EstimatedIncome = sum(incomePerRide))
df.estimatedIncome$date <- as.Date(with(df.estimatedIncome, paste(year, month, day,sep="-")),"%Y-%m-%d")
p <- ggplot(df.estimatedIncome, aes(x = date, y = EstimatedIncome, colour=operator)) + 
  geom_line() +
  xlab("Estimated Income") + 
  ylab("Day") + 
  ggtitle("Estimated Income 25kr per ride") +
  theme(axis.text.x = element_text(angle = 90)) +
  scale_colour_manual(values=c("#F56600","#6DDDb2","#F46C62","#009F1F"))
ggplotly(p)

This however is not too accurate, if a scooter is moved over 50m for i.e. maintanance this will register as a ride, if every scooter is moved once for non-trip purposes that can fast for operatoers as Voi and Tier amount to errors of up to 50000kr not to mention that 25kr might be a wrong estimate of average income per ride.

Conclusion

We have seen that there are about 3500 scooters in Oslo. Voi seems to have south of 2000 scooters, while Tier seems to have north of 1000 scooters. The scooters seems to be sed 3-6 times a day.Scooters seems to be more in use on weekdays

Whats next?

In the future I have planned to do the following work on the dataset